Scatterplots with ggplot2

Scatter plots allow us to place points that let us see possible correlations between two features of a data set. Let's see how we can create them with ggplot!

We'll use the built-in mtcars dataset:

In [10]:
library('ggplot2')
df <- mtcars
In [13]:
head(df)
Out[13]:
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Hornet 4 Drive21.462581103.083.21519.441031
Hornet Sportabout18.783601753.153.4417.020032
Valiant18.162251052.763.4620.221031

qplot()

In [14]:
qplot(wt,mpg,data=df)

Adding a 3rd feature

We can add a third feature by adding a color gradient on each point, or by resizing each point based on their value of this 3rd feature. For example:

In [15]:
qplot(wt,mpg,data=df,color=cyl)
In [17]:
qplot(wt,mpg,data=df,size=cyl)

Or both

In [18]:
qplot(wt,mpg,data=df,size=cyl,color=cyl)
In [21]:
# Show 4 features (this gets messy)
qplot(wt,mpg,data=df,size=cyl,color=hp,alpha=0.6)

ggplot()

Now let's see hwo to get more control by using ggplot():

In [30]:
pl <- ggplot(data=df,aes(x = wt,y=mpg)) 
pl + geom_point()

Adding 3rd feature

In [32]:
pl <- ggplot(data=df,aes(x = wt,y=mpg)) 
pl + geom_point(aes(color=cyl))
In [37]:
pl <- ggplot(data=df,aes(x = wt,y=mpg))

pl + geom_point(aes(color=factor(cyl)))
In [38]:
pl <- ggplot(data=df,aes(x = wt,y=mpg))

pl + geom_point(aes(size=factor(cyl)))
In [47]:
# With Shapes
pl <- ggplot(data=df,aes(x = wt,y=mpg))

pl + geom_point(aes(shape=factor(cyl)))
In [49]:
# Better version
# With Shapes
pl <- ggplot(data=df,aes(x = wt,y=mpg))

pl + geom_point(aes(shape=factor(cyl),color=factor(cyl)),size=4,alpha=0.6)

Gradient Scales

In [50]:
pl + geom_point(aes(colour = hp),size=4) + scale_colour_gradient(high='red',low = "blue")

Great! That's it for scatterplots, remember to reference the cheat sheet or the documentation for more details!